9 research outputs found
Scene Parsing using Multiple Modalities
Scene parsing is the task of assigning a semantic class
label to the elements of a scene. It has many applications in
autonomous systems when we need to understand the visual data
captured from our environment. Different sensing modalities, such
as RGB cameras, multi-spectral cameras and Lidar sensors, can be
beneficial when pursuing this goal. Scene analysis using
multiple modalities aims at leveraging complementary information
captured by multiple sensing modalities. When multiple modalities
are used together, the strength of each modality can combat the
weaknesses of other modalities. Therefore, working with multiple
modalities enables us to use powerful tools for scene analysis.
However, possible gains of using multiple modalities come with
new challenges such as dealing with misalignments between
different modalities. In this thesis, our aim is to take
advantage of multiple modalities to improve outdoor scene parsing
and address the associated challenges. We initially investigate
the potential of multi-spectral imaging for outdoor scene
analysis. Our approach is to combine the discriminative strength
of the multi-spectral signature in each pixel and the
corresponding nature of the surrounding texture. Many materials
appearing similar if viewed by a common RGB camera, will show
discriminating properties if viewed by a camera capturing a
greater number of separated wavelengths. When using imagery data
for scene parsing, a number of challenges stem from, e.g., color
saturation, shadow and occlusion. To address such challenges, we
focus on scene parsing using multiple modalities, panoramic RGB
images and 3D Lidar data in particular, and propose a multi-view
approach to select the best 2D view that describes each element
in the 3D point cloud data. Keeping our focus on using multiple
modalities, we then introduce a multi-modal graphical model to
address the problems of scene parsing using 2D3D data exhibiting
extensive many-to-one correspondences. Existing methods often
impose a hard correspondence between the 2D and 3D data, where
the 2D and 3D corresponding regions are forced to receive
identical labels. This results in performance degradation due to
misalignments, 3D-2D projection errors and occlusions. We address
this issue by defining a graph over the entire set of data that
models soft correspondences between the two modalities. This
graph encourages each region in a modality to leverage the
information from its corresponding regions in the other modality
to better estimate its class label. Finally, we introduce latent
nodes to explicitly model inconsistencies between the modalities.
The latent nodes allow us not only to leverage information from
various domains in order to improve the labeling of the
modalities, but also to cut the edges between inconsistent
regions. To eliminate the need for hand tuning the parameters of
our model, we propose to learn potential functions from training
data. In addition, to demonstrate the benefits of the proposed
approaches on publicly available multi-modality datasets, we
introduce a new multi-modal dataset of panoramic images and 3D
point cloud data captured from outdoor scenes (NICTA/2D3D
Dataset)
Sample and Filter: Nonparametric Scene Parsing via Efficient Filtering
Scene parsing has attracted a lot of attention in computer vision. While
parametric models have proven effective for this task, they cannot easily
incorporate new training data. By contrast, nonparametric approaches, which
bypass any learning phase and directly transfer the labels from the training
data to the query images, can readily exploit new labeled samples as they
become available. Unfortunately, because of the computational cost of their
label transfer procedures, state-of-the-art nonparametric methods typically
filter out most training images to only keep a few relevant ones to label the
query. As such, these methods throw away many images that still contain
valuable information and generally obtain an unbalanced set of labeled samples.
In this paper, we introduce a nonparametric approach to scene parsing that
follows a sample-and-filter strategy. More specifically, we propose to sample
labeled superpixels according to an image similarity score, which allows us to
obtain a balanced set of samples. We then formulate label transfer as an
efficient filtering procedure, which lets us exploit more labeled samples than
existing techniques. Our experiments evidence the benefits of our approach over
state-of-the-art nonparametric methods on two benchmark datasets.Comment: Please refer to the CVPR-2016 version of this manuscrip
Classification of materials in natural scenes using multi-spectral images
In this paper, a method suitable for distinguishing between different materials occurring in natural scenes using a multi-spectral camera is devised. Such a capability is useful in autonomous robot applications to help negotiating the environment as wel
Classification of natural scene multi spectral images using a new enhanced CRF
In this paper, a new enhanced CRF for discriminating between different materials in natural scenes using terrestrial multi spectral imaging is established. Most of the existing formulations of the CRF often suffer from over smoothing and loss of small de
Multi-view terrain classification using panoramic imagery and LIDAR
The focus of this work is addressing the challenges of performing object recognition in real world scenes as captured by a commercial, state-of-the-art, surveying vehicle equipped with a 360° panoramic camera in conjunction with a 3D laser scanner (LIDA
A Multi-modal Graphical Model for Scene Analysis
In this paper, we introduce a multi-modal graphical model to address the problems of semantic segmentation using 2D-3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by defining a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. We evaluate our method on a publicly available dataset and beat the state-of-the-art. Additionally, to demonstrate the ability of our model to support multiple correspondences for objects in 3D and 2D domains, we introduce a new multi-modal dataset, which is composed of panoramic images and LIDAR data, and features a rich set of many-to-one correspondences
Deep phenotyping: Deep learning for temporal phenotype/genotype classification
Background
High resolution and high throughput genotype to phenotype studies in plants are underway to accelerate breeding of climate ready crops. In the recent years, deep learning techniques and in particular Convolutional Neural Networks (CNNs), Recurrent Neural Networks and Long-Short Term Memories (LSTMs), have shown great success in visual data recognition, classification, and sequence learning tasks. More recently, CNNs have been used for plant classification and phenotyping, using individual static images of the plants. On the other hand, dynamic behavior of the plants as well as their growth has been an important phenotype for plant biologists, and this motivated us to study the potential of LSTMs in encoding these temporal information for the accession classification task, which is useful in automation of plant production and care.
Methods
In this paper, we propose a CNN-LSTM framework for plant classification of various genotypes. Here, we exploit the power of deep CNNs for automatic joint feature and classifier learning, compared to using hand-crafted features. In addition, we leverage the potential of LSTMs to study the growth of the plants and their dynamic behaviors as important discriminative phenotypes for accession classification. Moreover, we collected a dataset of time-series image sequences of four accessions of Arabidopsis, captured in similar imaging conditions, which could be used as a standard benchmark by researchers in the field. We made this dataset publicly available.
Conclusion
The results provide evidence of the benefits of our accession classification approach over using traditional hand-crafted image analysis features and other accession classification frameworks. We also demonstrate that utilizing temporal information using LSTMs can further improve the performance of the system. The proposed framework can be used in other applications such as in plant classification given the environment conditions or in distinguishing diseased plants from healthy ones.We thank funding sources including, Australian Research Council (ARC)
Centre of Excellence in Plant Energy Biology CE140100008, ARC Linkage Grant
LP140100572 and the National Collaborative Research Infrastructure Scheme
- Australian Plant Phenomics Facility